A colleague sampled the occurrence (0 or 1) of Canada lynx at 75 grid cells using camera-traps. They are interested in you helping 1) fit a model to estimate the occurrence probability of lynx and investigate the influence of two variables, and 2) inform future study design scenarios.
A colleague used camera traps to sample whether a lynx was present
(1) or assumed absent (0) at each grid cell or ‘site’ (y)
during the winter (December to February); we will assume there are no
false-positives or false-negatives in these data. They designed the
sampling and site selection such they had variation in two important
covariates: the distance the camera was from a road
(dist.road) and the percentage of forest cover
(cover). Their hypothesis is that lynx will avoid human
activity by occurring further from roads when they are not under cover,
but will occur near roads that are under cover as they are able to
remain hidden.
Fit the data (lynx.data.csv) using a model that captures
the hypothesis of your colleague.
Your colleague would like these data to inform them on whether they can sample less grid cells/sites in the future. They specifically want to know whether sampling at only 50 sites provides them enough statistical power to be confident that they will reject the null hypothesis of no difference with zero for each of the estimated coefficient at a type I error rate of 0.05. The power they want to achieve is 0.80 probability. Note - in a simulation context - think of getting the sampling distribution of the p-value for each coefficient and evaluating whether the proportion of p-values is greater than or equal to 0.80.
Use the estimated coefficients to simulate many data sets (>1000) for a sample of sites of 50. Fit each data set using the same model as you used to fit the empirical data. Extract the p-value of each coefficient to evaluate whether there is adequate statistical power based on your colleagues desire.